This is a GitHub repository for a Reinforcement Learning Tic Tac Toe project. It contains a single Python file, TicTacToeRL.py. The repository has 0 stars and 0 forks as of the current data.
This article details a method for training large language models (LLMs) for code generation using a secure, local WebAssembly-based code interpreter and reinforcement learning with Group Relative Policy Optimization (GRPO). It covers the setup, training process, evaluation, and potential next steps.